Google Cloud Datalab provides an easy environment for working with your data. This includes data that is being managed within Google Cloud Storage. This notebook introduces some of the APIs that Datalab provides for working with Google Cloud Storage.
You've already seen the use of %%gcs commands in the Storage Commands notebook. These commands are built using the same Storage APIs that are available for your own use.
For context, items or files held in Cloud Storage are called objects
. These are immutable once written. They are organized into buckets. Each object has a unique key.
In [1]:
import google.datalab.storage as storage
First, we will get our project name so we can construct an appropriate path to Cloud Storage. Run this code in your own project:
In [4]:
from google.datalab import Context
import random, string
project = Context.default().project_id
suffix = ''.join(random.choice(string.lowercase) for _ in range(5))
sample_bucket_name = project + '-datalab-samples-' + suffix
sample_bucket_path = 'gs://' + sample_bucket_name
sample_bucket_object = sample_bucket_path + '/Hello.txt'
print('Bucket: ' + sample_bucket_path)
print('Object: ' + sample_bucket_object)
In [5]:
shared_bucket = storage.Bucket('cloud-datalab-samples')
for obj in shared_bucket.objects():
if obj.key.find('/') < 0:
print(obj.key)
Objects can also be filtered while enumerating, since it is likely that a bucket may contain several objects.
In [6]:
for obj in shared_bucket.objects(prefix = 'httplogs/', delimiter = '/'):
print(obj.key)
In [7]:
sample_bucket = storage.Bucket(sample_bucket_name)
sample_bucket.create()
sample_bucket.exists()
Out[7]:
In [8]:
sample_object = sample_bucket.object('sample.txt')
sample_object.write_stream('Some sample text', 'text/plain')
In [14]:
list(sample_bucket.objects())
In [10]:
sample_object.metadata.size
Out[10]:
In [11]:
sample_text = sample_object.read_stream()
print(sample_text)
In [12]:
sample_object.exists()
Out[12]:
In [13]:
sample_object.delete()
sample_bucket.delete()